本文的目的是比较医学声学任务中不同可学习的前端。已经实施了一个框架,以将人类的呼吸道声音和心跳分为两类,即健康或受病理影响。在获得两个合适的数据集后,我们开始使用两个可学习的前端(叶子和nnaudio)对声音进行分类,以及一个不可学习的基线前端,即mel-Filterbanks。然后,计算出的功能将被馈送到两种不同的CNN模型中,即VGG16和EfficityNet。前端根据参数,计算资源和有效性的数量进行了仔细的基准测试。这项工作表明了神经音频分类系统中可学习前端的整合如何提高性能,尤其是在医学声学领域。但是,此类框架的使用使所需的数据数量更大。因此,如果可用于培训的数据量足够大以帮助特征学习过程,则它们很有用。
translated by 谷歌翻译
本文提出了一种基于机器学习的方法,旨在提醒患者可能呼吸道疾病。各种类型的病理可能会影响呼吸系统,可能导致严重疾病,在某些情况下死亡。通常,有效的预防实践被视为改善患者健康状况的主要参与者。提出的方法致力于实现一种易于使用的工具,以自动诊断呼吸道疾病。具体而言,该方法利用变异自动编码器体系结构允许使用有限的复杂性和相对较小的数据集的培训管道。重要的是,它的精度为57%,这与现有的强烈监督方法一致。
translated by 谷歌翻译
在最先进的心理学研究中,我们注意到,用现有的自动音乐转录(AMT)方法转录的钢琴表演不能成功地重新合成,而不会影响表演的艺术内容。这是由于1)不同乐器使用的MIDI参数之间的不同映射,以及2)音乐家适应周围声学环境的方式。为了面对这个问题,我们提出了一种方法来构建特定于声学的AMT系统,该系统能够模拟音乐家对传达其解释的适应性的建模。具体而言,我们在模块化体系结构中量身定制的虚拟仪器模型,该模型将音频记录和相对对齐的音乐得分作为输入,并输出每个音符的声学特定速度。我们测试不同的模型形状,并表明所提出的方法通常优于通常的AMT管道,该管道不考虑仪器和声学环境的特殊性。有趣的是,这种方法可以简单地扩展,因为仅需要轻微的努力来训练模型来推断其他钢琴参数,例如踩踏。
translated by 谷歌翻译
It is well known that conservative mechanical systems exhibit local oscillatory behaviours due to their elastic and gravitational potentials, which completely characterise these periodic motions together with the inertial properties of the system. The classification of these periodic behaviours and their geometric characterisation are in an on-going secular debate, which recently led to the so-called eigenmanifold theory. The eigenmanifold characterises nonlinear oscillations as a generalisation of linear eigenspaces. With the motivation of performing periodic tasks efficiently, we use tools coming from this theory to construct an optimization problem aimed at inducing desired closed-loop oscillations through a state feedback law. We solve the constructed optimization problem via gradient-descent methods involving neural networks. Extensive simulations show the validity of the approach.
translated by 谷歌翻译
Our aim is to build autonomous agents that can solve tasks in environments like Minecraft. To do so, we used an imitation learning-based approach. We formulate our control problem as a search problem over a dataset of experts' demonstrations, where the agent copies actions from a similar demonstration trajectory of image-action pairs. We perform a proximity search over the BASALT MineRL-dataset in the latent representation of a Video PreTraining model. The agent copies the actions from the expert trajectory as long as the distance between the state representations of the agent and the selected expert trajectory from the dataset do not diverge. Then the proximity search is repeated. Our approach can effectively recover meaningful demonstration trajectories and show human-like behavior of an agent in the Minecraft environment.
translated by 谷歌翻译
Object instance segmentation is a key challenge for indoor robots navigating cluttered environments with many small objects. Limitations in 3D sensing capabilities often make it difficult to detect every possible object. While deep learning approaches may be effective for this problem, manually annotating 3D data for supervised learning is time-consuming. In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We introduce a zero-shot split for Tabletop Objects Dataset (TOD-Z) to enable this study and present a method that uses annotated objects to learn the ``objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments. Our method, SupeRGB-D, groups pixels into small patches based on geometric cues and learns to merge the patches in a deep agglomerative clustering fashion. SupeRGB-D outperforms existing baselines on unseen objects while achieving similar performance on seen objects. Additionally, it is extremely lightweight (0.4 MB memory requirement) and suitable for mobile and robotic applications. The dataset split and code will be made publicly available upon acceptance.
translated by 谷歌翻译
Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively affirming -- siloed in so called ``echo chambers'' of exclusively like-minded discussants. Yet, to date we lack sufficient means to measure viewpoint diversity in conversations. To this end, in this paper, we operationalize two viewpoint metrics proposed for recommender systems and adapt them to the context of social media conversations. This is the first study to apply these two metrics (Representation and Fragmentation) to real world data and to consider the implications for online conversations specifically. We apply these measures to two topics -- daylight savings time (DST), which serves as a control, and the more politically polarized topic of immigration. We find that the diversity scores for both Fragmentation and Representation are lower for immigration than for DST. Further, we find that while pro-immigrant views receive consistent pushback on the platform, anti-immigrant views largely operate within echo chambers. We observe less severe yet similar patterns for DST. Taken together, Representation and Fragmentation paint a meaningful and important new picture of viewpoint diversity.
translated by 谷歌翻译
This volume contains revised versions of the papers selected for the third volume of the Online Handbook of Argumentation for AI (OHAAI). Previously, formal theories of argument and argument interaction have been proposed and studied, and this has led to the more recent study of computational models of argument. Argumentation, as a field within artificial intelligence (AI), is highly relevant for researchers interested in symbolic representations of knowledge and defeasible reasoning. The purpose of this handbook is to provide an open access and curated anthology for the argumentation research community. OHAAI is designed to serve as a research hub to keep track of the latest and upcoming PhD-driven research on the theory and application of argumentation in all areas related to AI.
translated by 谷歌翻译
Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-quality royalty-free image-text pairs, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. The proposed method has been applied to the most recent models, i.e., DALLE2, Latent Diffusion, Stable Diffusion, GLIDE and Craiyon. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score. The dataset has been made available to the public.
translated by 谷歌翻译
Real-time monocular 3D reconstruction is a challenging problem that remains unsolved. Although recent end-to-end methods have demonstrated promising results, tiny structures and geometric boundaries are hardly captured due to their insufficient supervision neglecting spatial details and oversimplified feature fusion ignoring temporal cues. To address the problems, we propose an end-to-end 3D reconstruction network SST, which utilizes Sparse estimated points from visual SLAM system as additional Spatial guidance and fuses Temporal features via a novel cross-modal attention mechanism, achieving more detailed reconstruction results. We propose a Local Spatial-Temporal Fusion module to exploit more informative spatial-temporal cues from multi-view color information and sparse priors, as well a Global Spatial-Temporal Fusion module to refine the local TSDF volumes with the world-frame model from coarse to fine. Extensive experiments on ScanNet and 7-Scenes demonstrate that SST outperforms all state-of-the-art competitors, whilst keeping a high inference speed at 59 FPS, enabling real-world applications with real-time requirements.
translated by 谷歌翻译